
Sed Man
|
Posted - 2010.06.22 05:17:00 -
[2]
Originally by: Hawk TT So, what options CCP has: a) Giving up on Stackless Python - it would mean rewriting millions of lines of code in....what? C++? Very unlikely, even if they had the R&D budget of Google or Micro$oft b) Throwing faster CPU cores @ the problem - this would provide limited scalability, becasue the modern CPUs go multi-core, they don't go with 10GHz frequency and the x86 architecture has very limited potential to improve in terms of IPC (Instructions per Cycle) performance. c) Throwing faster node interconnects like Infiniband + some HPC management stuff, so they could start moving SOL Servers on-the-fly (and not only during down-time) and on demand, so at least they could dedicate CPU cores to systems with large fleet battles. Jita has a dedicated CPU core and node, but Jita is predictable. Fleet battles are not so predictable, but at least CCP could gather statistics over time and could dedicate CPU cores to specific "hot" or "violent" 0.0 systems. I guess that CCP Yokai will announce soon improvements in this direction. d) Removing (detaching) secondary services out of the SOL Servers - Market, People & Places, Mail etc. - these are I/O intensive services and they put extra load on the SOL nodes - CPU cycles, I/O, networking etc. I guess CCP already acomplished "detaching" of Market & Mail... e) Last, but not least - employing GPGUs on the server nodes that could off-load thousands of "tiny caclulations" from the CPUs. x86 CPU cores have intrinsic serial nature, GPGPUs have intrinsic parallel nature. The question is if it is feasible to "export" some of the Stackless math functions to the GPGU and if the penalties and the overhead of moving data between CPU and GPGPU would be offset by the better raw performance of the GPGPU. It seems to me that its worth experimenting - the current SOL nodes uses old Intel FSB architecture and the effective memory bandwidht / latency is pretty comparable to the PCIe 2.0 performance. Once the data and the math code is in the GPGPU GDDR5 buffer, the calculations could be done in a massively parallel manner and CPU of the host will have free cycles to do other stuff...
this is why I wonder why not an IBM pSeries box... a 595 or even an older 690... lpars can dynamically share CPU from the pool and demand can move where its needed... cant do that with the wintel cluster.... a new 595 with power CPUs (8 core) will run the whole thing and all comms over the virtual switches are at infiniband/memory speed...and with VIO you can chuck many HBA's and MPIO up to the lpar.... many advantages over blades/blade centers... and if you have two of these 595's and the right software/firmware you can dynamically move all the lpars from one chassis to another... anyhow... I'm sure CCP have it all under control... even though it is on windows...
|